databasemanagementfandomcom-20200214-history
Normalization
Normalization Normalizationhttp://dev.mysql.com/tech-resources/articles/intro-to-normalization.html is a design technique that is widely used as a guide in designing relational databases. Normalization is basically a two step process that puts data into tabular form by removing repeating groups and then removes duplicate data from the relational tables. Normalization is based on the concepts of normal forms. A relational table is a normal form if it's satisfied with certain constraints. There are currently five normal forms for normalizationhttp://www.datamodel.org/NormalizationRules.html(first normal form, second normal form, third normal form and so on). Basic Concepts of Normalization The goal of normalization is to have relational tables free of redundant data and that can be correctly modified with consistency. If this holds true, then all relational databases should be in the third normal form. The first two normal forms are proceeding steps to get the relational database into the third normal form and achieve the goal of it getting there. Functional dependencies help understand the second normal form and any normal form there after. Functional dependencies are to make sure that data in certain tables are precisely correct and are associated with correct data in other tables at any given time. For example, column A of the relational table S is functionally dependent upon column X of table S if and only if value X in table S is only associated with one value of A at a given time. Normalization is the process of removing redundant data from relational tables by decomposing the tables into smaller tables by projection. First Normal Form A relational table is considered to be in the first normal form from the start. All values of the column are atomic, which means it contains no repeating values. Second Normal Form The second normal form means that only tables with composite primary keys can be in the first normal form, but not in the second normal form. A relational table is considered in the second normal form if it is in the first normal form and that every non-key column is fully dependent upon the primary key. The process of moving from a first normal form into the second normal form consists of five steps which include: 1. Identify any determinants other than the composite key, and the columns they determine. 2. Create and name a new table for each determinant and the unique columns it determines. 3. Move the determined columns from the original table to the new table. The determinate becomes the primary key of the new table. 4. Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. 5. The original table may be renamed to maintain semantic meaning. Third Normal Form A relational table is considered in the third normal form if all columns in the table are dependent only upon the primary key. The five step process for transforming into a third normal form are as follows: 1. Identify any determinants, primary key, and the columns they determine. 2. Create and name a new table for each determinant and the unique columns it determines. 3. Move the determined columns from the original table to the new table. The determinate becomes the primary key of the new table. 4. Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. 5. The original table may be renamed to maintain semantic meaning. The third normal form is where the relational tables should be because they have the advantage of eliminating redundant data which saves space and reduces manipulation anomalies. ---- Advanced Normalization Most experts agree that normalization past 3NF is a waste of resources because it yields few tangible benefits, but it is important to understand the problems presented past 3NF because they do occasionally occur in real life situations. Boyce-Codd Normal Form (BCNF) This is a more robust version of 3NF that occurs only under specific circumstances. There must be multiple candidate keys, one of the keys must be composite, and the candidate keys must overlap. In order to normalize the relation the developer must pick a determinant in which one column is fully functionally dependent upon. Then he must create a second relation so that every determinant is a candidate key. Fourth Normal Form (4NF) Issues with 4NF occur when a relation is in BCNF and all multi-valued dependences are also functional dependencies. For example, “Suppose that employees can be assigned to multiple projects. Also suppose that employees can have multiple job skills. If we record this information in a single table, all three attributes must be used as the key since no single attribute can uniquely identify an instance” (UTA). This would create anomalies that would fit under update, insert, and delete functions. In order to solve this problem, simply move each multi-valued dependency to its own table. Fifth Normal Form (5NF) 5NF addresses project-join dependency issues within tables that have been decomposed. During the normalization process large tables are broken into smaller cohesive tables. Usually a table in 4NF is considered to be in 5NF, but sometimes abstract insertion anomalies may hinder the join process of a decomposed table. If a table cannot be decomposed further, it is said to be in 5NF. Domain Key Normal Form (DKNF) Of all the advanced normalization forms, this one is by far the most academic. It was created as a “catch-all” to stop all critics of normalization of picking apart the system. Originally, 3NF was the final step to normalization, but others began trying to break it. Each form past 3NF is another one of these attempts. Finally Ron Fagan published a paper about basing normalization on domains and keys. His conclusion was that if one carefully creates tables so all domains and keys are of “logical consequence”, it should be fully normalized (Database Debunks). It cannot, however, really be proven that a table is in DKNF, while a table can always be tested to be in 5NF. Normalization Example Table 1 is in a database contains these attributes (A-O) and the dependencies are listed below. Table 1 (A,B,C,D,{E,F},{G,H,I,J},K,L,M,N,O) Dependencies: A→B,C,D,E,F,K,L C→D E→F G→H,I,J I→J K→L M→N,O First Normal Form – Removing Repeating Groups.Repeating groups are defined by {XXX} Table 1 (A,B,C,D,E,F,G,H,I,J,K,L,M,N,O) Transitive Dependencies The repeating groups are removed by making the determinate a primary key Second Normal Form – Remove Partial Functional Dependencies Table 1 (A,B,C,D,K,L,M,N,O) Table 2 (E,F) Table 3 (G,H,I,J) The Partial Functional Dependencies are removed by splitting the table into 3 different tables making the newly created primary keys the first attribute in the table Third Normal Form – Removing Transitive Dependencies Table 1 (A,B,C,D,K,M) Table 2 (K,L) Table 3 (M,N,O) Table 4 (E,F) Table 5 (G,H,I,J) →The transitive dependencies are removed by taking the transitive dependencies determinates in table 1 (K & M) and making them a foreign key in Table 1 and then creating two more table with those determinates as the primary keys. **Normalized through Boyce-Codd Normal Form** It is generally not required to normalize the database tables past Boyce-Codd Normal Form because the database is required to have some redundancy to properly connect the tables through referencing.This is called controlled redundancy. Sources: Database Debunks [http://www.utexas.edu/its/windows/database/datamodeling/rm/rm8.html University of Texas at Austin “Advanced Normalization”] -